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Tn the Claims: 

Please allow the amendment of claims 1 ? 10, 13, and 14 as indicated to put the claims in 
condition for allowance. 

1 . (Currently amended) A method of managing data storage in a data processing 

apparatus, the data processing apparatus including an information repository comprising a 

physical data storage medium and data structures for storing index information for locating data 

in the data storage medium, the method comprising the steps of: 

analyzing the contents of the set of files to identify components of the file contents which 
have duplicates within different files within the set; 

deleting duplicate components from the information repository while retaining at least one 
copy of each component, and generating index data for the retained copies which reflects the 
respective logical positions within the information repository corresponding to the positions of the 
retained copies and their deleted duplicates, and generating index data for remainder components 
which correspond to the remainder portions of a file after separation of duplicated components which 
remainder component index data reflects the logical positions of the remainder components within 
the information repository; 

storing the generated index data; 

communicating through an information specification interface with application programs for 
analyzing contents, deleting duplicates, and generating indexes, enabling t he application programs te 
registering as publishers and as subscribers for information; 

publishing an information request message that includes a response topic for an application 
program through the information specification interface ; 
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comparing information components created by a first application program with other 
application programs* subscription s using the response topic of each information request : and 

notifying an idontifiod subscriber applications program when a created information 
component matches a»the application program's subscription s in response to monitoring^_an 
information component comprising a specified response topic for the application program . 

2. (Original) A method according to claim 1 , wherein the analysis of file contents comprises the 
steps of: 

separating file contents into a set of information components comprising sub-sections of a 
file's contents, in accordance with predefined separation criteria; and 

analyzing the contents of said information components to identify duplicates. 

3 . (Original) A method according to claim 2, wherein the step of separating a file's contents into 
information components is initiated in response to a step of saving the file, and the steps of analyzing 
the contents to identify duplicates and then deleting duplicates are performed by a background 
process independently of user-controlled operations. 

4. (Original) A method according to claim 2, wherein said step of separating file contents 
comprises identifying a file type, selecting predefined separation criteria for the identified file type, 
and separating file contents in accordance with the selected separation criteria. 

5. (Original) A method according to claim 1, including the step of identifying information 
components corresponding to sub- sections of an identified component of a file's contents, which 
sub-sections have duplicates within different files within the set, and performing in relation to said 
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sub-section components said steps of deleting duplicates and generating and storing index data for 
retained single copies of duplicated sub-section components and generating and storing separate 
index data for remainder sub-section components. 

6. (Original) A method according to claim 5, wherein said steps of deleting duplicates and 
generating separate index data is performed subject to a defined minimum component size. 

7. (Original) A method according to claim 1, wherein the generated index data comprises: 

a set of file descriptions which each include an ordered list of identifiers of components 
corresponding to the contents of the respective file and information de fining a path within a directory 
structure corresponding to the logical location of the file within the directory structure; and 

a set of unique component identifiers to be stored in association with respective components. 

8. (Original) A method according to claim 7, wherein the index data is implemented using 
markup tags, with each unique component identifier comprising a unique tag pair identifying and 
delimiting the respective component within the information repository and said ordered list of 
component identifiers within each file description comprising a list of markup tags. 

9. (Original) A method according to claim 7, wherein the index data additionally comprises: 

an indication of the locations within the information repository of members of said set of 
unique component identifiers. 

1 0. (Currently amended) A data processing apparatus comprising: 
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an information repository for storing a set of files and for storing index infonnation for 
locating files within the information repository; 

controller components for controlling the operation of the data processing apparatus to 
perform the following method steps 

analyzing the contents of a set of files stored in the information repository to identify 
components of the file contents which have duplicates within different files within the set; 

deleting duplicate components from the information repository while retaining at 
least one copy of each component, and generating index data for the retained copies which 
reflects the respective logical positions within the information repository corresponding to 
the positions of the retained copies and their deleted duplicates, and generating index data for 
remainder components which correspond to the remainder portions of a file after separation 
of duplicated components which remainder component index data reflects the logical 
positions of the remainder components within the information repository; and 

storing the generated index data; and 

a publish/subscribe engine connected by an information specification interface for 
communication between application programs and said controller components for analyzing 
contents, deleting duplicates and generating indexes, wherein the publish/subscribe engine 
registers e nabl e s the application programs to*eg*sle* as publishers and as subscribers for information^ 
an application program publishing an information request message that includes a response topic 
through the information specification interface, and is adapt e d to the publis h/subscribe engine 
compar[[e]] ing information components created by a first application program with other application 
programs' subscriptions, and th e n to n otifying idontifi a d- subscriber an application's]] program 
when a created information component matches aathe application program's subscript ions_in 
response to monitoring an information component comprisine a specified response topic for the 
application program . 
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11. (Previously presented) A data processing apparatus according to claim 10, wherein the 
controller component for generating index data is adapted to generate: 

a set of file descriptions, which each include an ordered list of identifiers of information 
components corresponding to the contents of the respective file and information defining a path 
within a directory structure corresponding to the logical location of the file within the directory 
structure; and 

a set of unique component identifiers to be stored in association with respective components; 

wherein the apparatus further comprises a component for analyzing the index data for all 
components of the set of files to identify and generate a representation of a directory structure. 

12. (Canceled) 

1 3. (Currently amended) A data processing apparatus according to claim 10, including one or 
more search agents for performing search and retrieval operations from the information repository in 
response to information r equests from one or more application programs. 

1 4. (Currently amended) A computer program product comprising program code recorded on a 
computer-readable recording medium, the program code including instructions for controlling the 
operation of a data processing apparatus, when executed thereon, to perform a method for managing 
storage of a set of files within an information repository, the information repository comprising a 
physical data storage medium and data structures for storing index information for locating files in 
the physical data storage medium, wherein the program code comprises: 

means for analyzing the contents of the set of files to identify components of the file contents 
which have duplicates within different files within the set by separating the file contents into a set of 
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information components comprising sub-sections of a file's contents, in accordance with predefined 
separation criteria and in response to a saving of the file and analyzing the contents of said 
information components to identify duplicates by a background process independently of user- 
controlled operations; 

means for deleting duplicate components from the information repository by a background 
process independently of user-controlled operations while retaining at least one copy of each 
component, and for generating index data for the retained copies which reflects the respective logical 
positions within the information repository corresponding to the positions of the retained copies and 
their deleted duplicates, and for generating index data for remainder components which correspond 
to the remainder portions of a file after separation of duplicated components which remainder 
component index data reflects the logical positions of the remainder components within the 
information repository; 

means for storing the generated index data; 

means for communicating through an information specification interface w ith application 
programs for analyzing contents, deleting duplicates and generating indexes, enabling t he application 
programs te-rcgistcring as publishers and as subscribers for information; 

means for publishing an information request message that includes a response topic for an 
application program through the information specification interface: 

means for comparing information components created by a first application program with 
other application programs 5 subscriptions using the response topic of each i nformation request: and 

means for notifying an id e ntifi e d subscrib e r applications program when a created information 
component matches an application program's subscriptions in response to monitoring an information 
component comprising a specified response topic for the application program . 
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